-
Notifications
You must be signed in to change notification settings - Fork 19
Dynamic profiling intel sync #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dynamic_profiling
Are you sure you want to change the base?
Dynamic profiling intel sync #78
Conversation
…using master branch
Intel sync
Adjust field name in hostheartbeat queries
… multiple processes
…ests alongside other project services
…ons for docker compose
Set debug port in docker
Add metrics publisher to indexer with S3 endpoint configuration
Max Profiling Frequency and Profiler Selection in Dynamic Profiling
8b18493 to
86e6ca7
Compare
- Add PostgreSQL schemas for dynamic profiling tables - Add Python Pydantic models for API requests/responses - Add 4 REST API endpoints for dynamic profiling: * POST /api/metrics/profile_request * POST /api/metrics/heartbeat * POST /api/metrics/command_completion * GET /api/metrics/profiling/host_status - Add comprehensive documentation and setup guides - Add unit tests for profiling host status functionality - Includes Slack notification support for profiling events This contribution provides the complete dynamic profiling feature for Intel's gprofiler-performance-studio, enabling start/stop profiling commands, heartbeat monitoring, and host status tracking. Co-authored-by: Pinterest Engineering <[email protected]>
86e6ca7 to
eb4cc45
Compare
Remove files that are specific to Pinterest's infrastructure: - .github/workflows/sync-upstream.yml (Pinterest CI/CD) - DEPLOYMENT_CHECKLIST.md (Pinterest deployment docs) - deploy/localstack_init/01_init_s3_sqs.sh (Pinterest testing) - docs/METRICS_PUBLISHER_INDEXER_DOCUMENTATION.md (Pinterest metrics) - docs/SLI_METRICS_IMPLEMENTATION.md (Pinterest SLI metrics) - src/gprofiler/backend/utils/metrics_publisher.py (Pinterest code) - src/gprofiler_indexer/metrics_publisher.go (Pinterest code) These files are not part of the dynamic profiling contribution to Intel.
- Remove MetricsPublisher import from main.py - Remove startup/shutdown event handlers for MetricsPublisher - Remove all send_sli_metric() calls from profiles_routes.py These are Pinterest-specific metrics tracking not needed for Intel PR.
- Remove MetricsPublisher configuration from args.go - Remove metricsPublisher initialization from main.go - Remove all SendSLIMetric() calls from queue.go and worker.go - Remove related comments These are Pinterest-specific metrics tracking not needed for Intel PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements the foundational infrastructure for a dynamic profiling feature in gProfiler Performance Studio, introducing PostgreSQL database schemas, Python API models, comprehensive testing framework, and frontend UI for remote profiling control.
Key Changes:
- PostgreSQL database schemas for profiling requests, commands, heartbeats, and execution tracking
- Pydantic API models for type-safe profiling operations (start/stop, host-level, process-level)
- Comprehensive test infrastructure with Docker Compose orchestration and pytest-based unit/integration tests
- Frontend UI for profiling status monitoring with filtering, auto-refresh, and profiler configuration
Reviewed changes
Copilot reviewed 70 out of 73 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/setup/postgres/gprofiler_recreate.sql | Adds 4 core tables (ProfilingRequests, ProfilingCommands, HostHeartbeats, ProfilingExecutions) with indexes for dynamic profiling |
| src/gprofiler/backend/models/metrics_models.py | Defines Pydantic models for profiling requests, heartbeats, and command completion with validation |
| src/gprofiler/backend/routers/metrics_routes.py | Implements 4 REST endpoints for profiling lifecycle management (create request, heartbeat, completion, status) |
| test/docker-compose.yml | Docker Compose configuration for test environment with all service dependencies |
| src/tests/unit/backend/test_profiling_host_status.py | 22 unit tests validating filtering, performance, and data consistency for host status endpoint |
| src/tests/integration/backend_db/test_create_profile_request.py | 9 end-to-end integration tests covering profiling workflows from API to database |
| src/gprofiler/frontend/src/components/console/ProfilingStatusPage.jsx | React component implementing profiling control UI with real-time updates and advanced configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| AWS_SESSION_TOKEN= | ||
|
|
||
| # postgres: | ||
| TEST_TEST= |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name TEST_TEST is unclear and appears to be a placeholder or typo. Consider renaming to something more descriptive like TEST_MODE or removing if unused.
| TEST_TEST= |
| Region: aws.String(args.AWSRegion), | ||
| Endpoint: aws.String(args.AWSEndpoint), | ||
| Region: aws.String(args.AWSRegion), | ||
| Endpoint: aws.String(args.AWSEndpoint), |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of S3ForcePathStyle: aws.Bool(true) changes S3 path handling behavior but lacks explanation. Add a comment explaining why path-style addressing is required (e.g., for LocalStack compatibility or S3-compatible services).
| Endpoint: aws.String(args.AWSEndpoint), | |
| Endpoint: aws.String(args.AWSEndpoint), | |
| // Use path-style addressing for S3 to support custom endpoints (e.g., LocalStack, MinIO, or other S3-compatible services) |
src/gprofiler_indexer/callstacks.go
Outdated
| log.Infof("DEBUG: Sending metric record to channel - ServiceId=%d, HostName=%s, HTMLPath=%s", | ||
| serviceId, hostname, path) | ||
| pw.metricsRecords <- metricRecord | ||
| log.Infof("DEBUG: Metric record sent to channel successfully") |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug logging statements should be removed before merging to production. These appear to be temporary debugging code added during development.
| log.Infof("DEBUG: Metric record sent to channel successfully") |
src/gprofiler_indexer/callstacks.go
Outdated
| log.Infof("DEBUG: hostname=%s, htmlBlobPath='%s', CPUAvg=%f, MemoryAvg=%f", | ||
| fileInfo.Metadata.Hostname, htmlBlobPath, fileInfo.Metrics.CPUAvg, fileInfo.Metrics.MemoryAvg) | ||
|
|
||
| if htmlBlobPath != "" || (fileInfo.Metrics.CPUAvg != 0 && fileInfo.Metrics.MemoryAvg != 0) { | ||
| log.Infof("DEBUG: Writing metrics for hostname=%s", fileInfo.Metadata.Hostname) | ||
| pw.writeMetrics(uint32(serviceId), fileInfo.Metadata.CloudInfo.InstanceType, | ||
| fileInfo.Metadata.Hostname, timestamp, fileInfo.Metrics.CPUAvg, | ||
| fileInfo.Metrics.MemoryAvg, htmlBlobPath) | ||
| } else { | ||
| log.Infof("DEBUG: SKIPPING metrics write for hostname=%s - condition failed", fileInfo.Metadata.Hostname) |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple DEBUG logging statements should be removed or converted to proper debug level logging before production deployment.
| log.Infof("DEBUG: hostname=%s, htmlBlobPath='%s', CPUAvg=%f, MemoryAvg=%f", | |
| fileInfo.Metadata.Hostname, htmlBlobPath, fileInfo.Metrics.CPUAvg, fileInfo.Metrics.MemoryAvg) | |
| if htmlBlobPath != "" || (fileInfo.Metrics.CPUAvg != 0 && fileInfo.Metrics.MemoryAvg != 0) { | |
| log.Infof("DEBUG: Writing metrics for hostname=%s", fileInfo.Metadata.Hostname) | |
| pw.writeMetrics(uint32(serviceId), fileInfo.Metadata.CloudInfo.InstanceType, | |
| fileInfo.Metadata.Hostname, timestamp, fileInfo.Metrics.CPUAvg, | |
| fileInfo.Metrics.MemoryAvg, htmlBlobPath) | |
| } else { | |
| log.Infof("DEBUG: SKIPPING metrics write for hostname=%s - condition failed", fileInfo.Metadata.Hostname) | |
| log.Debugf("DEBUG: hostname=%s, htmlBlobPath='%s', CPUAvg=%f, MemoryAvg=%f", | |
| fileInfo.Metadata.Hostname, htmlBlobPath, fileInfo.Metrics.CPUAvg, fileInfo.Metrics.MemoryAvg) | |
| if htmlBlobPath != "" || (fileInfo.Metrics.CPUAvg != 0 && fileInfo.Metrics.MemoryAvg != 0) { | |
| log.Debugf("DEBUG: Writing metrics for hostname=%s", fileInfo.Metadata.Hostname) | |
| pw.writeMetrics(uint32(serviceId), fileInfo.Metadata.CloudInfo.InstanceType, | |
| fileInfo.Metadata.Hostname, timestamp, fileInfo.Metrics.CPUAvg, | |
| fileInfo.Metrics.MemoryAvg, htmlBlobPath) | |
| } else { | |
| log.Debugf("DEBUG: SKIPPING metrics write for hostname=%s - condition failed", fileInfo.Metadata.Hostname) |
mlim19
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reviewed few files yet. Will continue
| ```bash | ||
| # Test Case: Create profiling request with PerfSpect enabled | ||
| curl -X POST http://localhost:8080/api/metrics/profile_request \ | ||
| -u "prashantpatel:password" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please delete the user id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| ) | ||
| raise HTTPException(400, {"message": "Failed to register the new service"}) | ||
| return ProfileResponse(message="ok", gpid=int(gpid)) | ||
| # Server error - counts against SLO raise HTTPException(400, {"message": "Failed to register the new service"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if commenting out "raise" and "return" statements is intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mlim19 , The raise and return statements are not commented out - they are fully active and functional.
The confusion may be coming from how GitHub displays the diff. The removed MetricsPublisher code (between the comment and the raise statement) might make it appear compressed in the diff view.
- Removed metricsPublisher initialization and cleanup calls from main.go - This completes the cleanup of Pinterest-specific metrics code - Indexer now builds successfully in Docker
- Fixed multiple statements on same line (syntax errors from previous cleanup) - Split logger.error() and raise HTTPException() onto separate lines - Fixed indentation after except blocks - All services now build and run successfully
skamerintel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR #78 - Required Fixes Before Approval
Date: December 16, 2024
PR: #78
The following issues must be resolved before this PR can be approved for merge:
- DEPLOYMENT CONFIGURATION (deploy/.env)
ISSUE 1.a: AWS Configuration Hardcoded
- Lines 7, 35-38: AWS_REGION, BUCKET_NAME, and SQS URLs are hardcoded
- REQUIRED FIX: Keep these fields empty/configurable as they were in master branch
- RATIONALE: S3 bucket names must be globally unique; regions vary by deployment
ISSUE 1.b: LocalStack Configuration in Production File
- Lines 49-68: LocalStack and test metrics configuration
- REQUIRED FIX: Remove lines 49-68 entirely OR move to separate .env.local file
- These include:
- LocalStack endpoints (lines 53-55)
- Metrics configuration with test values (lines 57-67)
- RATIONALE: LocalStack is dev-only; metrics should default to disabled
ISSUE 1.c: Metrics Configuration
- If metrics feature is retained, METRICS_ENABLED must default to "false"
- Provide documentation for metrics host setup if this feature is included
- Remove test UUIDs and localhost endpoints
- DOCKER COMPOSE SECURITY (deploy/docker-compose.yml)
ISSUE 2.a: PostgreSQL Port Exposed
- Lines 58-59: Exposes database port 5432 to all network interfaces
- REQUIRED FIX: Remove these lines or comment them out
- RATIONALE: Containers communicate internally; external exposure is security risk
ISSUE 2.b: LocalStack as Production Dependency
- Lines 96, 104, 119, 220, 233: LocalStack hardcoded in service configurations
- Lines 242-266: LocalStack service definition
- REQUIRED FIX:
- Remove LocalStack from main docker-compose.yml
- Create separate docker-compose.local.yml for development
- Remove LocalStack from service dependencies
- RATIONALE: Production should use real AWS services, not emulator
- DATABASE MIGRATION (scripts/setup/postgres/)
ISSUE 3.a: No Migration Strategy
- dynamic_profiling_schema.sql adds 10+ new tables
- No migration scripts for existing deployments
- REQUIRED FIX:
- Provide upgrade script: migrations/add_dynamic_profiling.up.sql
- Provide rollback script: migrations/add_dynamic_profiling.down.sql
- Document migration process
- RATIONALE: Existing production deployments need safe upgrade path
ACCEPTANCE CRITERIA
[ ] deploy/.env has no hardcoded AWS values
[ ] deploy/.env has no LocalStack configuration
[ ] deploy/docker-compose.yml has no exposed database ports
[ ] deploy/docker-compose.yml has no LocalStack dependencies
[ ] Migration scripts provided for schema changes
- Remove hardcoded AWS config and LocalStack from deploy/.env - Remove LocalStack service from deploy/docker-compose.yml - Remove exposed PostgreSQL port from docker-compose.yml - Add database migrations (up/down) for dynamic profiling
|
@skamerintel I have addressed all review comments. |
|
@mlim19 @skamerintel Can you please review when you get some time? thank you |
|
@ashokbytebytego and @prashantbytesyntax I successfully got the changes working in my local machine. It looks good so far but will evaluate with the changes in gprofiler agent side. |
Add Dynamic Profiling Data Models and PostgreSQL Schemas
Summary
This PR contributes the foundational data model infrastructure for the dynamic profiling feature, including PostgreSQL database schemas and Python API models.
What's Included
Database Schema (
scripts/setup/postgres/)API Models (
src/gprofiler/backend/models/)Documentation (
docs/)Tests (
src/tests/)Benefits